DOCOnENT BESaHE 



ED 19B 177 

ROTROB 
TITLE 

INSTITUTION 

SPONS AGENCY 
B'PPOPT NO 
POB DATE 
NOT? 

EDBS PRICE 
DESCRIPTOBS 



IDENT'IPIEBS 



Leinharat, Gaea: Seewald^ Andrea Mar 
Overlap: What's Tested^ What's Taughi 
Pittsburgh Oniv. , Pa. learning Beseai 
Development Center* 

National Inst, of Education (ED) ^ Was 

IPDC-19B0/16 

Jun 80 

32p. 

MF01/PC02 Plus Postage. 
Curriculum Evaluation; *Evaluation 
♦instruction: ^Program Effectiveness; 
Attitudes: ^Testing Problems 
Test Curriculum Overlap 



ABSTRACT 

In studying the effectiveness of diff 
instructional practices or programs, ib is possible t 
measure mav be biased in favor of a particular practi 
because the overlap between thui test and one program 
for the other fs) . Two basic approac'nes for dealing wl 
Issue are reviewed. The first approach is a systemati 
curricula and tests to help guide test selection- A s 
to overlap is to measure the degree of overlap direct 
incorporate such a measure into the analysis. This ap 
used in ccnlunction with the first, or aT.one* Also re 
ways to directly measure overlap which have been deve 
involves teacher • interviews or questionnaires. The se 
analyzing the curriculum to assess if information reg 
test has been covered by the curriculum. The teacher 
approach reflects both informal in^class instruction 
curriculum-based instruction, but it may also include 
expectation about student competency. Both approaches 
in predicting final test performance. (Author/EL) 



•^PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANtId By 



Uf 



TO THE EDUCATIONAL RESOURrpr 
INFORMAHON CENTER (Eflicr^ 



Univefsity of Pittsburgh 



OVERUP: WHAT'S THSTED, WHAT'S TAUGHT? 



Gaea Lelnhardt and Andrea Mar Seewald 



Learning Research and Developtnent Center 
University of Pittsburgh 



June, 1980 



The research reported herein was supported by the Learning Research 
and Development Center, supported in part by funds from the National 
Inatitute of Education (NIE), United States Department of Health, 
Educction, and Welfare. The opinions expressed do not necessarily 
reflect the position or policy of NIE, and no official endorsement 
should be inferred. 



FEB 2 3 1981 



Abstract 



In studying the effecclvanebp of different Instructional 
programs. It Is possible for a criterion measure to favor one program 

! 

ovc;- the others because the overlap betvreen the criterion test and 
program content is greater, if the overlap Is not controlled for, one 
program may look artificially better with respect to teat performance 
than others. Several approaches to dealing with overlap are reviewed 
and two techniques for its estimation, in the context of actual 
educational evaluations, are explored. 



-1- 



4 



OVKRUPj WHAT'S TESTED, WHAT,»8 TAUCIITf 



Ga«a UXnhardt und Andrta Mar Sattwald 

Uarning Retsarch and Dovalopoent Center 
Unlveralty of Plttaburgh 



Considering the extent to which ichool children are tested In the 
United Statesi and the myriad decisions that are based on the results 
of such tests, It Is Important to understand what generates variations 
In test performance. Decisions range from those that affect the 
Individual student to those that affect a school or district. 
Examples of Individual decisions Include: passing a student on to new 
material, promotion to the next grade, and eligibility for placement 
In special classes. Examples of district level decisions Include: 
choosing a new curriculum, selecting schools for special services, or 
adopting a new compensatory program. Because testing Information Is 
the basis for Individual and program evaluation, understanding the 
real meaning of a test score Is of considerable Importance. 

The score received on a "good" achievement test Is meant to 
reflect the sctual knowledge that an Individual or group has about tho 
domain from which that test was drawn. That Is, a good test Is 
assumed to be a random aample of Items from a specified domain, where 
domain Includes both content covered and Item form. The domain of 
Instruction may be Identical to the domain sampled by the test. It may 

-3- 



5 



partUlly ovarUp or ihAra the donntn of th<i t«et, or It •«/ b« 
totally different* Vfhcn « aot of toat ■coroa aro ueed to help 
■VAluttta chtt tnpAct of tnatruct tonal prograna, knowledge ebout the 
extent of overlap te crittcel to Interpretation of the reeulte. If 
different tnatructlonal prograne heve different overlep with the 
criterion meaeurek then reeulte can be blaeed In favor of the program 
with the higher ovarlepi The purpoee of thle paper le to review 
approachee to the problem of deellng with overlap, to euggeet aethode 
for neeeuring overlep between teet end curriculum, and to examine the 
Imdlcetlone of overlap for evaluation atudleei 



Approachee to the Problem of Teet*Currlculum Overlap 
EducttCora, aepeclelly curriculum deelgnere, have long been aware 
Chec teete and curricula can and ofcen do emphaelze different aapecte 
of e particular knowledge domain (Cole & Nltko, 1979; Roeenehlne, 
1978; Walker & Schaf farzick, 1974). Awareneaa of the problem of the 
fit or lack of It between curricula and teste haa generated four typea 
of aolutione: flret, to build new **criterlon-ba0«!d** teats for each 
aituatlon; second, to alter existing tests by deleting Items that do 
not reflect curriculum content; thirds to systematically analyze 
curricula and tests In order to select the beat existing test; and 
fourth, to directly measure the relationship and Incorporate It into 
an analysis. 



-4- 

e 



Tiitt to Match Curricula 

PophM <197S) hdi bMn thp mnin proponunt for the cooiitruction of 
**n%^** crit#rion-rf faranced ttata to accurately roflact tha content of 
different curricula. Thia approach la advantageoua bacauae It lnaur«a 
■axlmum fit. However, program evaluatlona often Involve nultlpla 
contraata. When thla la the ct^ae* the aoproach of building apeclflc 
teata auat be moditled to produce a alngU teat that la In aome aanae 
comparable or fair to all curricula. For exiaple, If four programa 
wer«>i to be crntraated, each of which ueed 4 different currlculua, a 
battery could be conetructed by soloctlng Itene unique to each 
curriculum, ahared by each currlculua, and untaught by any curriculum. 
Of couree, In order to bo fair, the teat ahould be equally weighted 
for each currlculua In terme of the proportion of Iteaa of each type. 
Such a teat would aaeure that eoae predeteralnod quantity of the teet 
had been covered and that eoae proportion of the content covered waa 
actually teatrd, Kugle and Calklne (1976) describe a procedtiTVe for 
developing criterion referenced teata for fifth grade aath and aoclal 
atudUa claaarooaa by aatchlng aubt^ata and currtcular objectlvea. 
Teat conatructlon aeeaed difficult and coacly and would be ao In aoat 
caiea. Alao, In the work of Kr lo and Calklna, only one Instructional 
program waa being used. Hoat program Interventions uae aeveral 
curricula In different configurations which further compUcatea teat 
conatructlon. If, however, only one or two curricula are being 
contraated, building new teats or modifying existing oneu may be 
a tralght forward and useful . There are , of course , t remendoua 
advantages In moving from norm-referenced to criterion-referenced 

-5- 



7 



•tarMiirdUad t»ta. The tmuf hciro U only vh«ither to huiH new t«iit« 
"on thr ipot" or not. 

Altir Inlitlni Taitii 

Anothir ipproneh !■ to tiika ■xUtlnd taat bAtt«rUii «nd d»Ut« 
Umt not Included In a pertleuUr currlculun or to record eopArete 
ecoree for iMterlel teught end not teught. While euch en epproech ney 
be quite tnforvetive for foraetlve eveluettone when developere ere In 
e Curriculum revleton Bode, It le leee ueeful for euneettve work, 
riret, If M«verel currlcule are contreeted, ther eelectlng ttetie for 
Incluelon or exi^luelon le ee difficult ee conetructlng new teet tteae. 
Second, **meeelns** with e teet by reoiovins Iteme dilutee or deetroye 
the deelrebie peychonetrlc propertlee of thet teet. Third, deleting 
lte«e doee not guarantee thet whet le teught get\ tee ted, only thet 
whet Iti teeted wee teught. 

Selecting the Beet-Fltt tng Teet 

Forvel ettenpte to deel directly with the overlep problea by 
Identifying the beet-fitting teet have developed along two rather 
different linoet datelltd curriculum enelyele end teacher^-baeed 
eetlftatione. The curriculum enalyele epproech hee grown up around 
Identlflcetlon of the beet teet eelectlon procedure; the teacher 
eetlmetlon epproech hee grown up around the eveluatlon of different 
Inetructlonal prograae. Both approeche^ cen be conducted at any level 
of enelyele thought epproprlete (etudent. Instructional group, class, 
or school; Item, eubtest. or total test). 

•6- 



8 



havt tandiKl (o Involve aiileh^i b«tw««n d^tfitUd 9cq^ And •equoneiP 
churta «nd t«iit dffiicrlptlonti of cotU«n( eov^r^d (Afffi^iPuiiUr , 8ittv«nt| 
k Roatnahtnoi 1977; Ivvtvtti 1976} Kuglt 4 OiUln«, 1976) rid«9on, 
1970). T1i«ii<» (in«lyi«ri arft uauAlly eondv»cc«d *t thf> totnl t«at «nd 
total cur;lculu« Uv^l; th«y do not Includu (nfoiHAtlon on hov «ueh 
««ttrliil vAi Actually covwrvd in Initrurrion by th« ichool, cUiii or 
■ tudont. Tb«ao unAlynsi ort«»n look «t both iiid«i of ihi» ovorUp 
quaatlon— how such of th« t#At hiii b««n covered by ^ht currlculu* an 
voll «■ how «uch of th« currlculu« vaa t«it«d. In th« Araibruat«r «t 
al. (1977) analyata, tha ralattonahlp batwttan 3 curilcula and 2 toata 
rangod tvm tlO to .43 ualng 6 out of 16 catogurlta aharad by all 
curricula and toata, but only a anall parcantagv of okllla taught vara 
taatad. tn analyaaa of thla cypa. It la aaaumad that tha aaaa or 
aittllar labala (••g. datall, parAphraaa, main ld«a, ate.) rafar to 
tha aaaa contant and that dtffarant labala rafar to dlffarant contant* 
In aplta of rathar groaa maaauraa of curriculum and taat contant, tha 
approach claarly ravai.lad aavaral thlnga. Ftrat, Introductory raadlng 
curricula covar ranaikably diffarant contant. (Back and HcCaalln, 
1978, alao ahowad thia dramatically.) Uacond, taata (or at laaat thoaa 
raviawad) covar a mora almllar range of toplca than curricula. Ttiird, 
vary littla of what ia taugh*: avar gata ttatad. Fourth, it ia 
importa.it to uaa auch Information In aelactlng a teat for program 
avaluation. 

Joaaph Jankina and Darlana Pany (1976) alao noted tha diffarencea 
batwaen contant covered in treading curricula and the content of 

-7- 



9 



•t«M«Ntit4 rtttdtni |«att (or iiubl«iilt). Thttr tnalytU of ■•vtn 
oo«Mroi«l rt«dtni ■•rl«ii iind ftv« rtAdlni nchtovMiinC iM% wti 
coii4uec«4 by •auhlnf ch« ^v4a prMnntad In ■neb currteulvMi v, ch cb« 
vorda that ■pp««r»4 on gtoh Kfat. lUaulta raVMlod "currlcuU* biaa** 
bttnoan liitii for ■ iitn|U currtculua ■■ wtil «■ on « alniU t«at for 
dtffarant rcadtni currtcuU. In thuti dUcymion of cha iaptUnttont 
of thCH findtnua for idueattoniil raaaiirehi choy «u|Kaat chat on« munt 
■ithor control for currtculi iicroii cri«catnt condition* or diivolop 
tiiti that «ra currtculua«b4iaad (or cr ttirlon-riftrtncad) t 

Hora ractntlyi Portar and hta collaifuoa at tha Inatltuto for 
tftMirch on Taaching hava b«an analyalnt taata and curricula In 
alasaotary aathafMttca (Plodan» Portar, S«haldt, 4 Praasan, 1978; 
Kuha, Schaldt, Portar, Plodan, Praavan, \ Schvttta, 1979) Portar, 
1978; Portar* Schaitdt, Plodan. 4 Praawin, 1978 a 4 b; Schaildt, 1978; 
Schwilla, Portar, 4 Cant, t979)« In thta tapraaalva Una of work, a 
datatlad taxonoay of ala«antary aiathaaatlca toptca ^ vaa conatructad 
Milch can b« uaad to »ap out taata and curricula. Aa Portar at al« 
point out, dlatrlct-»laval daclalon »akara naad to ba kaanly awara of 
tha ralatlonahlp batwaan taata and curricula tf thay ara to aaka aanaa 
of taat raautta froa dtffarant achoola. Vhat thta Una of work would 
ultlaataly laad to la policy dlacuaalon about approprlata contant to 
ba covarad by Inatructlon and aubaaquantly to ba taatad. I^ua, It 
could potantlally tnfors daclalon-nakara ragardlng currlculua daalgn 
and aalactlon and taat aalactlon. Tha aaln drawbacka of thla approach 
for prAgraa avaluatlon ara that It conaldara only matarlal Includad In 
a fonaal currlculuait not teachar praaantatlon, and doaa not yield a 

-8- 



10 



A 4irftr«nt ipproneh to tH« owrUp problMi Kai f««ri#^ fro* 
progrM tviiluiltiin mvk In wtiUh th« probltM of aulclpU (U sowtf 
€■■•■• •ultl-fuiclon«U currUuU, i tt«lco4 ttit battocy and 
UttrpraCAIton of rfiuUa iiro pr«vaUnt. tn iin dvaluarion, If a 
portleulir toat !■ uaod tluit rofUcti tH« contfnt of on« eurrleulua 
mora than inochor, tht coat o«n bo conNldfrod to bo bUsttd (n favor of 
thot eurrloutuM. Thti aooai obvloui» but tbo procUo «««aura««n( of 
bow Much •era t toat roflocta tho contont of ono currlculua than 
■nothar ta uaually not undortokon not ta It uauatly tnclMdod In tho 
arulyala of raoulta. Juot aa It la Inappropriato to tiiAoilna poatttat 
difforancaa without protaat lnforfMilon» It la atao loapproprtato to 
Ignora variation In tho opportunUy to laarn tho aatarlal thot la 
bolim tajtod. ratluro to cottatdor variation In tha opportunity to 
Uarti ;ha Mtarlal «ay Uad to tho attribution of diffaranrfa to 
prograai^ nr #tudant charactarlat lea whan» in fact, tht dlfftroncaa «ay 
11a in tha aatch (or alasatch) botwoon what la talng taat«NS and vhar 
Haa boan taught. 

Tha aarliaat work In actually aaaturlng ovarlap waa by Hub/n 
(1967), Chang and Katha (I97l)» and Coabor and Kaavaa (1973). In a 
faahion roalnlacant of tha acopo and aaquanca chartOi Coabor and 
Kaavaa aakad taachora to aatl»ata the quantity of a taat covarad by 
Inatruction (taachor praaantation ptua eurrleulua) for an a ntlr^ 
ocl^ool. Hua/n apparantly obtalnad allghtly »ora datallod aatiootaa by 
obtaining parcantagaa df atudvnta that covar«d each caac tta» > bur tha 
procodluroa wara unavaoly uaad aeroaa tha atudy. "boaa aaaauraa 



9- 




Ihf olhtr My Arovmd. TH« ««Myr(i elit«tn»^ too iroM lo 

«4f^«Mltly Aceouni (or viirUclon In atu^tnt fMirfonMne* «n4 vi« not 
ln«ludttf In thi Aimlyftti, but vii um4 « gutdo for (h» 
tnttrpritttlon of rttuUt. 

Tht dooign of (ho IntirucUofWil Diitntlont tiudy (IDA) tneludod 
oatlMtoi of opportynUy to U«rn eontonC lh«t mo t#iit<| on « 
•tondordUtd b«tt«ry (Cooloy 4 LttnHiirdt, 197)) 1980). Thti 
MMurt to b« Ineludod at a covarUt# in tho analytti of ttm 

•ffictlvtnoit of IndtvtduAlliod Initructlon for co«|^n«atory progrNaa. 
In iMplMonting thta «ap«ct of th» daalgn, Uf foynor (t978> ua«d two 
baate aatlsation appro«chtt» on* bai»4 on taachar Intarvlava at tho 
tnd of tha y«ar» tha othar on m eurrlculua analyata of aatartal 
coyorod by atudanta. goth approachaa aro Uattad to aatlaattng hov 
auch of tha taat waa covarad» not how at*ch of what vaa taught waa 
taatad. In tha nixt taction, «^ turn to t^1 apaclfic datalla of 
aaaauring oyarlap dlractly. 



Taachafiaatd Haaauraa of OvarUp 

In IDS, tha taachar aattaata wai obtatnad for avary Itaa of tha 
taac but at tha claaa Uval (Poynor, 197S). Taachara utra aiUd to 
aatlaata tha parcantaga of atudanta that had bacn taught tha atntaua 
aatarial nacaaaary to paaa tha Itea. Parcentage of overlap acoroi 
vara obtatnad by a convaraton proceat that ytalda a toa«wh«t aablguoui 



H aaauring OvarUp 




number. The number reflect^s both the percentage of the test taught 
and the percentage of students that were taught it. Given that 
analyses were conducted at the classroom level, the information, while 
still less fine-grained than one might wish, was both usable and 
informative (Cooley & ucinhardt, 1980). 

Another approach to obtaining a teacher estimate of overlap has 
lieen used in several instructional effectiveness studies (Cooley, 
Leinhardt, & Zigmond , 1979; Leinhardt & Engel, 1980; Leinhardt, 
Zigmond, & Cooley, 1980). In this approach, two components of a test 
item are considered, content and format. The teacher is asked to 
identify for each student (or a sample of students) whether or not the 
student has ,been_taught the information required to answer the item, 
as well as whether or not the student has been exposed to the type of 
format the item employs. The teacher is not being asked whether s/he 
taught to the item, rather s/he is being asked whether the student has 
been taught the information the item is testing; for younger 
children, this includes familiarity with format. The teacher does 
this task for each student and each item; the time required to 
complete this task is minimal (approximately 30 minutes for 10 
children). The teacher estimates include information about content 
covered through curriculum and teacher presentation, but ma^ also 
include information about a teacher's expectation of success for a 
given student. The teacher estimates do not include information on 
how much of what was taught was tested. ^ 



11- 




Curriculum- baaed Measures of Overlap 

In order CO avoid Che problem of teacher bias another technique 
has been developed, computer-based ctirrlculum analysis. This approach 
combines teacher sources of Information with curriculum analysis. To 
use this approach, each Item of a test Is analyzed to assess what 
Infor^iiatlon Is needed to **pas8** the Item. Information on the 
curricula used and the location (beginning and end) of each student In 
each curriculum Is obtained . A dictionary of test-relevant 
Information Is then constructed (for example, vocabulary words 
presented In texts) for each student, and the dictionaries are matched 
with the Item Information to determine what percentage of the. test has 
been covered through curriculum presentation. A similar system was 
used In IDS by Lee Poynor (1978). When multiple curricula are 
Involved, even a small study Involving 52 students (Copley et al., 
1979) can become very difficult. Vast amounts of Information on each 
curriculum must be entered, sorted, and merged with student files. 
Further, while teachers can make some judgment about Instructional 
adequacy for paragraph comprehension, computer-based curriculum 
analyses have so far been extremely conservative, and have drastically 
underestimated Instructional coverage. 

In summary, there are two oaslc ways In which estimates of 
Instructional overlap with criterion measures have been obtained: 
teacher responses and computer-based analyses. Regardless of which 
approach Is used, the Information can be collected at the student, 
I'nstrucJilonal unit, class, grade or school level; the entire test, or 
some smaller portion of It, may be the focus of the Initial measure. 

-IE- 



14 



Thus, overlap can range from an estimate of what percentage of the 
total test has been covered by an entire school to which items have 
been covered by one student. Obviously, if data are collected at the 
student by item level, they can be aggregated to higher levels at the 
time of analysis. 

Overlap in Evaluative Research 
One of the important uses of overlap is in interpreting the 
results of an evaluation or of evaluative research. In this case, 
overlap is acting as a covariate in much the same way as pretest 
scores do. In a regression sense, the equation is: 



Where Y is a posttpst score, A is the intercept, is the pretest , 

^2 overlap, and ... is program membership or some cluster of 

process variables such as instructional time, teacher behaviors, etc. 

Figure 1 displays these variables in a causal map. Criterion 

performance is considered to be affected by what students knew 

initially, overlap, program membership and/or instructional processes; 

in addition, overlap is affected by what students knew initially and 

possibly by program membership. Of course, X , instructional 

3 

process, may itself be more complexly modeled and, in some cases, 
causally linked to X^. It should be noted that the general regression 
equation above does not test the model in Figure 1, it only tests the 
arrows impinging directly on posttest. 



13 



15 



INSTRUCTIONAL 
PROGRAM OR 
PROCESSES 




Figure 1, A Causal Map of Overlap in Evaluative Research 



16 

ERIC 



The importance of overlap for any given study is contingent upon 
many elements: the accuracy of estimation, the degree to which what 
has been taught has been learned, the degree to which what has not 
been taught has not been learned; and the complexity or 
non-hierarchical nature of the subject matter domain. The first three 
of these elements are soiewhat self-explanatory, the fourth is less 
so. It is likely that as subject matter complexity increases and 
hierarchy decreases, overlap will be both less important and more 
difficult to measure with respect to a single instructional exposure 
such as a year. For example, a student's ability to write a cohesive 
argument about Macbeth as a tragic figure in Shakespeare, or to 
describe the relationship of membership in a cross-cousin matrilineal 
society and yam growing to ego development among the Trobriand 
Islanders is dependent not only on students having been exposed to 
Macbeth, Shakespearean drama, Freud, and Malinowski, but probably a 
vast number of other elements that permit students confronted with a 
new task or tasks to vary in their responses. However, elementary 
education in the basic skills (reading and arithmetic) is somewhat 
easier to analyze for purposes of estimating overlap. 

Teacher-Based Estimates of Overlap 

'As previously mentioned, IDS included two estimates of overlap. 
In both cases, the pre and posttest of interest was the Comprehensive 
Tests of Basic Skills (CTBS) (CTB/McGraw-Hill, 1973). Approximately 
400 first and third grade classrooms were studied during reading and 
mathematics instruction. The teacher *s estimate of overlap was 



-15- 



7 



obtained by asking teachers to determine the percentage of students 

who had been taught the information required by each item and then 

averaging across items to get a percentage of the test that had been 

covered. The means, standard deviations, correlations and regressions 

of the teachers' estimates of overlap are reported in Table 1. 

As Cs^n be seen, before any program information has been included, 

pretest and overlap explain considerable and significant portions of 

the variances* Three of the means hover around 50 percent with 20 as 

a standard deviation; correlations with pretest are about .3 and with 

2 

posttest about .4. The increase in R from first to third grade is 

largely due to the stronger relationship between pre and posttest in 

the higher grades. This is reflected not only by the zero order 

correlations but also in the greater magnitude of the coefficients and 

the smaller standard errors. Also worth noting is how uncommon first 

2 

grade math is, both in mean overlap estimates and in R , This, 
along with other information, represents a warning signal that first 
grade math results are of some concern. 



-16- 



18 



Table 1 

Analysis of Teachers* Estimates of Overlap (IDS) 









Correlations 






n 


Overlap 
mean s. d. 


Overlap 

with 
Pretest 


Overlap 

with 
Posttest 


Pretest 

with 
Posttest 


Grade 1 Read (R 1) 


104 


50.93 18.46 


• 33 


.47 


• 50 


f^rskA^ 1 XXo^Ki i\A 1\ 
wX^aQC X jviabn ^iVI 1 ) 


84 


27. 59 12. 33 


. 32 


.37 


.39 


Grade 3 Read (R 3) 


109 


51.12 21.33 


. 34 


.38 


.86 


oracxe ^ M.atii [m.^} 


116 


56.14 19.40 


.41 


.51 


.78 


Posttest^^ 


Regression Equations^ 

= 30. 9 + .63 Pretest + .17 Overlap 
(.14) (.04) 


Adjusted R^ 
.34 


Posttest. 


= 20. 1 


+ .52 Pretest + 
(. 19) 


. 13 Overlap 
(.05) 


.20 




Posttest 

R <^ 


= 15.8 


+ .89 Pretest + 
(.06) 


. 05 Overlap 
(.02) 


.74 




Posttest- 


= 16.0 


+ .91 Pretest + 
(.08) 


. 13 Overlap 
(.03) 


.66 





All of the coefficients and adjusted R are significant at or below . 05. 
Standard errors are in parentheses. 



1,9 



More recently, we have been Involved in a study of reading In 
elementary level Learning Disabilities classrooms. This study was 
begun during the 1977-1978 school year and continued through 1978-1979 
(Cooley, W. W., Uinhardt, G., & Zigmond, N., 1979; Leinhardt, G. , 
Zigmond, N., & Cooley, W- W., 1980). In the first year, the CTBS was 
used as both the pre and posttest measure; during the 1978-1979 
phase, the pretest used was the Spache Diagnostic Reading Scales 
(Spache, 1972). Overlap data were collected at the student by item 
level. Teachers were asked, for each student, to circle items (on the 
reading subtests of the CTBS) that contained content covered by 
instruction whether the instruction was in text or classroom teaching. 
Overlap was the number of items circled divided by the total number of 
possible items, times 100. The estimates of 52 cases from the first 
year of the study ranged from 2.70 to 100.00 percent (x - 59.12; 
s.d. "32.73). In the second year of the study, the estimates ranged 
from 7.14 to 100 percent for 105 cases (X - 56.33; s.d.- 27.33). 



-18- 




Table 2 
Correlations and RegresaionM 
Ualng Teachers' Estimates of Overlap (LD) 

Correlation Matrix (1977-1978) 









2 


3 


4 


1. 


Pretest 


1.00 


.73 


.56 


.82 


2. 


Teacher's Estimate of Overlap 


.73 


1.00 


.45 


.81 


3. 


Silent Reading (per 40 minutes) 


.56 


.45 


1.00 


.62 


4. 


Posttest 


.82 


.81 


.62 


1.00 



Posttest* = 134.1 + .40 Pretest + .64 0verUp + 34. 4 SUent Reading 
(•10) (.14) (12.8) 

Adjusted R^ = . 79 



Correlation Matrix (1978-1979) 









2 


3 


4 


1. 


Pretest 


1.00 


• 40 


.63 


.83 


2. 


Teacher's Estimate of Overlap 


.40 


1.00 


.31 


.50 


3. 


Silent Reading (per day) 


.63 


. 3i 


1.00 


.63 


4. 


Posttest 


.83 


.50 


.63 


1.00 



Posttest* = 177.6 + 6.2Pretest + .40Overlap + 1. 1 Silent Reading 
(.66) (.12) (.45) 

Adjusted R^ = . 72 



All of the coeffiTcients and adjusted R arvii significant at or below .05. 
Standard errors are in parentheses. 

-19- 

31 



Tabl€ 2 preaencs the correlACions and regressions for the cwo 

years of study. Overlap was obtained by teacher interviews for each 

child and is the percentage of the test covered by instruction. 

Silent reading is a aessure of the aaount of tiae a student spends 

reading silently, and was obtained by observing individual students 

during regular clasirooa instruction* This variable is included to 

represent one of the aost relevant aspects of instructional processes 

as depicted in Figure 1. Overlap is again an important variable in 

predicting end-of-year test performr.nce. In this set of evaluative 

research studies, the iaportant aspect is not which program the 

student was following, but the relationship between student behaviors 

and reading performance. Here again, knowledge of the degree of 

2 

overlap is critical to interpreting the results. 

Curriculum-based Estimates of Overlap 

CUrriculum-based estimates of overlap are much harder to obtain 
than teacher estimates, but, on the surface at least, they have more 
objectivity and are less subject to bias. Two studies have used a 
curriculum- based estimate, IDS (Cooley & Leinhardt, 1980; Poynor, 
1978) and the first year of the LD reading study (Cooley et al.. 



Lee Poynor developed a curriculum-based estimate of overlap at 
the student level for IDS. The curriculuai**based estimate used teacher 
reports of content covered in each text for each student and matched 
that content to the content of the test. The results are presented in 
Table 3. 



1979). 



-20 




Table 3 

Analyaia of Curziciilum-baied Eitimatea of Overlap (IDS) 













Correlations 






n 


Overlap 
mean s,d. 


Overlap 

with 
Pretext 


Overlap 

with 
Po attest 


Pretest 

with 
Posttest 


Grade 1 Read (R 1) 


104 


27.13 


14.91 


.21 


.42 


.50 


Grade 1 Math (Ml) 


84 


15.03 


10. 38 


.33 


. 38 


.39 


Grade 3 Read (R 3) 


109 


20.02 


5. 60 


-.05 


. 10 


• 86 


Grade 3 Math (M 3) 


116 


30.52 


14. 56 


.30 


.42 


.78 



Regression Equations 



Posttest = 31.95 + .70 Pretest + .20OverUp 



(.13) 



(.05) 



Adjusted R 
.34 



Posttest = 21.4 .52 Pretest + .16 Overlap .20 
(.19) (.06) 



Posttest = 11.8 + .94 Pretest + .25 Overlap .75 
(.05) (.08) 



Posttestj^^ = 17. 2 + .95 Pretest + .15 Overlap 



(.08) 



(.04) 



.65 



a 2 
All of the coefncients and ac'tu^^ed R are significant at or below. 05. 

Standard errors are in parenthe'*<:8. 



.21- 



23 



Thre« inc«rcacing differences between Table I and Table 3 should be 
poioccd out. Flric» all eiciOMCei are lover using currlculuv analyses 
rather than teacher estimates. Second » Grade I cuth, while Io%rest» Is 
not as draaatically different froa the rest of the curriculua-based 
estimates as it is In the teacher estioates. Thirds Grade 3 reading 
overlap does not correlate with either pretest or posttest» however » 
the regression coefficient is still significant. This is probably due 
to chs difficulty in estimating content covered prior to grade 3 in 
reading. In order to estioate iten-Ievel overlap^ sone assuaptions 
must be oade about what the student was exposed to prior to Book 3 
Level I» for example* The regression results are almost identical to 
those obtained using the teacher estimate. Considering the 
substantial differences in means and the totally different process of 
gathering the information » this suggests that eitimateii are somewhat 
stable regardless of technique. 

A curriculum-based measure of overlap was also included in the 
design of the first year of the study of reading in learning 
disabilities classrooms (I977-I978). In January of 1978» teachers 
were asked to list the major curricula used with each student. At the 
time of posttesting^ May» 1978, that list was verified and teachers 
were asked %rtiere each student was in each curriculum at that point In 
time (i.e.» final location). For each level of each curricular 
series » all words presented were entered into the computer with an 
identifier indicating the series, level, and unit, chapter, or page in 
which the word appeared* These words were then sorted in alphabetical 
order and duplications were deleted based on the higher level 



.22 




id«ntlfUri. Th«ae worda chcn foni«d a dictionary of unique vords 
iQcIudiQg Che flrit preaencaclon of each word only. Separate 
dlcclonerlei vere conplled for each currlculua. Individual icudeot 
dlcclonerlei were coaplled to Include thoie leveli completed In each 
curriculum the itudent uied during the year baaed on the itudent*i 
end-of-year location given by the teacher* Theie were Hatched with 
the appropriate level of the poittest (CTBS). 

Once all dlctlonarlei had been entered, verified, iorted, 
dupllcatei deleted and reverlfled, a coaputer program was designed by 
Melanle Bowen to aarch Individual student dictionaries with the CTBS 
dlctlonarlei. The program then calculated the percent of overlap In 
several ways. The total test by Item analysis (as opposed to a by 
word analysis Ignoring Items) Is reported here. The total test by 
Item measure of currlcular overlap had a mearv,of 19.65 (a.d. - 14.73; 
n - 52). 

The results shown in Table 4 again Indicate that while the means 
are lower for curriculum-based estimates, the regression Is 
essentially the same. These results, coupled with the IDS results, 
suggest that the mean curriculum-based overlap estiaiate is always 
lower than the teacher estimate because It automatically leaves out 
In-class Instruction not found In textbooks, but that both estimates 
do equally well In predicting posttest. Choosing which estimate '^Is 
better Is a matter of either philosophy or money. 



2Pi 



Table 4 

CorreUtloni and Regreaaiona 
Uiing Curriculum Eitlmatea of Overlap 









2 


3 


4 




Preteit 


1.00 


.67 


.56 


.82 


1. 


Curriculum Eitimate of Overlap 


.67 


1.00 


.59 


.71 


3. 


Silent Heading (per 40 minutci) 


.56 


.59 


1.00 


.62 


4. 


Poatteat 


.82 


.71 


.62 


1.00 



PoiUeit*« 107.6 + .59 Preteit + .70Overlap + 28. 2 SUent Reading 
(.11) (.36) (15.6) 

Adjusted R^ « .72 

*AU of the coefficient! and adjusted R^ are ■igniac»at at or below . 05. 
Standard errora are in parentheaea. 



-24- 

26 



Summary and Concluelons 
In studying the effectiveness of different instructional 
practices or progrums, it is possible that an outcome measure may be 
biased in favor of a particular practice or program because the 
overlap between the test and one program is greater than for the 
other(s). When this occurs, one program or set of practices may look 
artifically better with respect to test performance than another. 
Awareness of this problem is long standing. Effective ways of dealing 
with the problem are Just emerging. 

In addition to test modification or construction, two basic 
approaches for dealing with the overlap issue have emerged. The first 
approach is a systematic analysis of curricula and tests to help guide 
test selection. The most promising work for primary level mathematics 
has been done by Porter and his colleagues. Information from this 
analysis (if the analyses are expanded) car. serve as a basis for 
critiquing tests and curricula, and can aid policy analysts in 
interpreting research results. A second approach to overlap is to 
measure the degree of overlap directly, and Incorporate such a measure 
into the analysis. This approach could be used in conjunction with 
the first, or alone. 

Two ways to directly measure overlap have been developed. The 
first involves teacher interviews or questionnaires. The second 
involves analyzing the curriculum to assess if information required by 
the test has been covered by the curriculum. The teacher interview 
approach reflects both informal in-class instruction and 
curriculum-based instruction, but it may also include the teacher's 

-25- 



27 



expectation about student competency. [It is worth noting that in the 
study of reading instruction discussed oarlier, an estimate of teacher 
expectation for academic success failed to predict teachera' estimates 
of overlap. Thus, overlap estimatea 8ef<m to be freer of teacher bias 
than we originally assumed (Leinhardt et al., 1980).] The curriculum 
analysis approach is leas likely to be biased, but it is costly, 
tIme-*conauffling, and less likely to capture informal instruction. Both 
approaches do equally well in predicting final teat performance. 

In addition to the two measurement approaches discussed, some 
attention needs to be paid to the level at which the data are 
collected. It is our conviction that student by item level data are 
the easiest and the most accurate. If time or cost preclude gathering 
the information for each student on each item, then students should be 
randomly sampled and data aggregated. Having the teacher estimato 
percentages of students requires the teacher to think of groups of 
students, estimate the percentages they represent, and average (2 out 
of 30 have all, 3 out of 30 have none, etc.). The task is quite 
complex and likely to be errorful. 

In conclusion, in order to assure that evaluation results do not 
miarepreaent programmatic or inatructional differences, it is vital to 
include information about overlap in the analysis. Future work should 
lead to the improvement of measurement techniques (perhaps including 
frequency of presentation of information) and to a greater 
underatanding of what types and levels of criterion tasks will not be 
predicted by simple estimates of overlap. 

-26- 



28 



Raftrencai 



Armbnuicer» B. B.» Scsvcna. R. J., & Rosenshlna* B. Analyglng content 
covrage and amphaala : A atudy of thraa curricula and two taata 
(Tachnical Raport 26). uFbana, IL: Univaraity of lllinoia at 
Urbana-Chaapalgn » 1977. 

Back, 1. & McCaalin, E. An analyala ot dinenaiona that affact the 
davalopaant of co'3?-braaklng ability In algfft beginning reading 
programa > Plttaburgh, ¥ki Onlveralty of Plttaburgh, ^Learning 
Raaaarch and Devalopaent Center, 1978. (LRDC Publication No. 
1978/6) 

Chang, S. S. & Ratha, J. The achool'a contribution to the cumulating 
deficit. Journal of Educational Reaearch , 1971, 64^, 272-276. 

Cole, N. S. & Nltko, A. J. Inatrunentatlon and blaa ; issuea In 
aelectlng neaaurea for education al evaluation . Paper preaented 
at the National Synpoaium on Educational Reaearch, Johna Hopklna 
Univaraity, Novenber 1979. 

Camber, L. C. & Keevea, J. P. Science education In nineteen countrlea . 
International Studlea in Evaluation New York: John Wiley & 
Sona, 1973. 

Cooley, U. W* & Lelnhardt, G. Dealgn for the Individualized 
Inatructlon Study ; A atudy of the ef fectlveneaa of Individualized 
Inatructlon In the teaching of reading and aathematlca In 
conpenaatory education prograaa . Pinal Report. Plttaburgh, PA: 
Univaraity of Plttaburgh, Learning Reaearch and Development 
Center, 1975. 

Cooley, U. U. & Lelnhardt, G. The Inatructlonal Dlmenalona 
Study. Educational Evaluation and Policy Anaylals, 1980, 2(1), 
7-25. 

Cooley, W. W., Lelnhardt, G., & Zlgmond, N. Explaining reading 
. performance of learning dlaabled atuderxa . Plttaburglu PA: 
Univaraity of Plttaburgh, Learning Reaearch and Development 
Center, 1979. (LROC Publication No. 1979/12) 

CTB/McGraw-Hlll. Comprehenalve Teata of Baalc SkiUa. Monterey, 
CA: McGraw-Hill, 1973. 

Everett , B. B* A preliminary atudy of the relevance of ^ standardized 
teat for measuring achievement gains In Innovative arithmetic 
programs . Project Longatep Pinal Report: Volume II, Appendix 
Report. Palo Alto, CA: American insCitutes for Research, 1976. 



27 



2.9 



Piahor, C. W., Berliner, D. C, Filby, N. N., Marliuvo, R., Cahen, L. 
S., DiBhaw, M. M., & Moore, J. E. Teaching and learning In the 
elementary school : A aummary of the Beginning Teacher EvaluatT^n 
Study . Beg inning Teacher Evaluation Study, Technical Report 
VII-l. San Francisco, CA: Far Weat Laboratory for Educational 
Research and Developcncnt , 1978. 

Floden, R. E., Porter, A. C, Schmidt, W. H., & Freeman, D. J. Don't 
they all measure the same thing? Conaequences of selecting 
standardiged tests . East Lansing, MI: Inatitute for Research on 
Teaching, Michigan State University, July, 1978. (Research Series 
No. 25) 

Husen, T. (Ed.) International study of achievement in mathematics ; A 
compariaon of twelve countries . Volume II. New York: John Wiley & 
Sona, 1967. 

Jenkins, J. R. & Pany, D. Curriculum biases In reading achievement 
teata . Technical Report No. 16, Center for the Study of Reading, 
Univeraity of Illinoia at Urbana-Champaign, November, 1976. 

Kugle, C. L. & Calkins, D. S. The effect of considering student 
opportunity to learn in teacher tu ehavior reaearch'~ (Reaearch 
Report No. 7). Austin, TX: University of Texas, Reaearch and 
Development Center for Teacher Education, 1976. 

Kuha, T. , Schmidt, W. , Porter, A. , Floden , R. , Freeman , D. , 6i 
Schvrllle, J. A^ taxonomy for classifying elementary school 
mathematlca content . East Lanalng, MI: Inatitute for Reaearch on 
Teaching , Michigan State Univeraity, April , 1979. (Reaearch 
Seriea No. 4) 

Leinhardt, G. & Engel, M. Iterative evaluation ; NRS, an example . Paper 
presented at the annual meeting of the American Educational 
Reaearch Association, Beaton, April 1980. 

Leinhardt, G., Zigmond, N., & Cooley, W. W. Reading instruction and 
its effects . Paper preaented at the annual meeting oT the 
American Educational Research Association, Beaton, April 1980. 

Marliave, R. , Fisher, C, Filby, N. i Dlahaw, M. The development £f 
instrumentation for a field atudy of teaching . Beginning Teacher 
Evaluation Study, Technical Report I-*5. San Francisco, CA; Far 
Weat Laboratory for Educational Research and Development, 1977. 

Pidgeon, D. A. Expectation and pupil performance . Stockholm; Almquiat 
& Wiksell, 1970. 



-28 



30 



Pophia, W. J, Th« c«a« for crlterlon-rflftrancid meaiurtaflnta . 
Educational RaMTchar . 1978, 1(11), 6-10. 

Portar, A. C. Ralationahipa batvaan taatlnj^ and tha currlculua. Eaat 
Lanalng, hit loatltuta ror Raaaarch on TaacRTog, Hichlgan Stata 
Unlvaralty, July, 1978, (Occaalooal Papar No. 9) 

Portar. A. C,, Scluildt» W. U., WLodan, K. B,, & Praaman, J, lapact 
2£ lill i«Portanca of contant covarad, Eaat Unalng/MIs 

Michigan Stata Unlvaralty, Inatltuta for Raaaarch on Taachlng, 
1978. (Raaaarch Sarlaa No. 2) (a) 

Portar. A. C,, Schmidt, W- H,, Plodan, R, E,, 4 Praaman, D, 
J. Practical algnlflcanca In program avaluatlon, Aoarlcan 
Educational Raaaarch Journal . 1978, 15(4), 529-539. (b) 

Poynor, l. Inatructlonal Dlmanalona Study: Data managamant procedurea 
*■ axaapiifiad by currlculua^SiTyalT r Paper praaentad at the 
annuax maatlng of the Aaarlcan Educational Raaaarch Aaaoclatlon, 
Toronto, April 1978, 

Roaanahlna , B. Academic engaged mlnutea, content covered, and direct 
inatructlon, Itapubllahad manuacrlpt , Unlveralty of iXTlnola at 
Urbana-Champalgn, 1978, 

Schmidt, W. H. Maaaurlng tha contant of Instruction ^ Eaat Unalng, 
Ml: Inatlt^ui I^t Raaaarch on Teaching, Michigan State 
Unlveralty, October, 1978, (Raaaarch Series No! 35) 

Schwllla, J., Fbrter, A., & Cant, M. Content decision-making and the 
politics of education. Eaat Unalng, hU MlchlgaF^Stlt? 
r«i)!*'^!^'^^* Institute for Research on Teaching, June, 
1979. (Reaaarch Sarlaa No. 52) 

D. Dlagnoatlc reading scales . Monterey, CA: CTB/McGraw- 
nlll, 1972. 

Walter, D. P. & Schaffarzlck, j. Comparing curricula. Review of 

Educational Raaaarch . 1974, 44, 83-112. — 



-29- 



3t 



Footnot«a 



1. Using a related neaauro of teacher dlagnoatlc skill, the 
reaearchers In the BTES study scored the teacher reaponaea In 
terma of their accuracy (Fisher, Berliner, Pllby, Marllave, 
Cahen, Dlahaw, & Moore, 1978; Marllave, Plaher, Pllby, & 
Dlahaw, 1977). The dlatlnctlon Is Important. To obtain an 
overlap neaaure, one merely sums the Items the teacher 
estlmatea have been covered by Inatructlon. To obtain an 
estimate of dlagnoatlc skill (or hlta), one aums the number 
of Items for which the teacher's estimate and student 
performance concur. Por example, If a teacher said none of 
the material on a test had been taught, the overlap* measure 
would be zero; If the students missed all of the Items on 
such a test, the dlagnoatlc acore would be 100 percent. 

2. A aecond meaaure calculated was the percent of hits, or 
diagnostic ability. A hit was counted for each Item for 
which the teacher's estimate matched the actual performance 
of the student. The percent of hits for 1977-1978 ranged 
from 47.30 to 94.59 percent with a mean of 70.74 percent 
(s.d.-12.22). The correlation with posttest was .53 and .58 
with pretest. In 1978-79 with 105 cases, the range was 43.24 
to 91.89 percent with a mean of 64.30 percent (s.d.-10.72), 
and the correlation with the poattest Is .20 and .14 with 
pretest. 



-30- 




